A Fuzzy K-nn Approach for Cancer Diagnosis with Microarray Gene Expression Data
نویسندگان
چکیده
Recent advances in DNA microarray technology have made it possible to measure the expression level of several thousand of genes simultaneously. The gene expression profiles obtained from microarray techniques have provided the opportunity of early diagnosis of cancer with the use of supervised learning algorithms. As a simple, effective and nonparametric classification method, k-Nearest Neighbor (k-NN) algorithm has recently been applied for the problem of cancer diagnosis and categorization. An obvious problem of traditional k-NN algorithm is that, when the density of training data is uneven, the precision of classification may reduce due to the consideration of first k nearest neighbors but not the differences of distances. A recent solution for this problem is adopting the theory of fuzzy sets and constructing a new membership function based on the similarities. This study has been conducted to demonstrate in what degree the fuzzification of k-NN algorithm can improve the prediction accuracy of cancer classification based on gene expression data. According to the results of the experiments over a six distinct benchmarking dataset spanning 27 diagnostic categories, it reveals that the fuzzy k-NN algorithm promotes the accuracy of cancer classification to a certain degree. Results also encourage the use of this fuzzification technique on similar problems in computational biology.
منابع مشابه
Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملGene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کاملDiagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data
Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کامل